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RERANDOMIZATION TO IMPROVE COVARIATE BALANCE IN 

EXPERIMENTS 1 

By Kari Lock Morgan and Donald B. Rubin 
Duke University and Harvard University 

Randomized experiments are the "gold standard" for estimating 
causal effects, yet often in practice, chance imbalances exist in co- 
variate distributions between treatment groups. If covariate data are 
available before units are exposed to treatments, these chance im- 
balances can be mitigated by first checking covariate balance before 
the physical experiment takes place. Provided a precise definition of 
imbalance has been specified in advance, unbalanced randomizations 
can be discarded, followed by a rerandomization, and this process 
can continue until a randomization yielding balance according to the 
definition is achieved. By improving covariate balance, rerandomiza- 
tion provides more precise and trustworthy estimates of treatment 
effects. 

1. A brief history of rerandomization. Randomized experiments are the 
"gold standard" for estimating causal effects, because randomization bal- 
ances all potential confounding factors on average. However, if in a particular 
experiment, a randomization creates groups that are notably unbalanced on 
important covariates, should we proceed with the experiment, rather than 
rerandomizing and conducting the experiment on balanced groups? 

With k independent covariates, the chance of at least one covariate show- 
ing a "significant difference" between treatment and control groups, at sig- 
nificance level a, is 1 — (1 — a)'^. For a modest 10 covariates and a 5% 
significance level, this probability is 40%. "Most experimenters on carry- 
ing out a random assignment of plots will be shocked to find how far from 
equally the plots distribute themselves" [Fisher (1926)]. The danger of re- 
lying on pure randomization to balance covariates has been described in 
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Seidenfeld (1981); Urbach (1985); Krause and Howard (2003); Rosenberger 
and Sverdlov (2008); Rubin (2008a); Keele et al. (2009) and Worrall (2010). 
Also, there exists much discussion historically over whether randomization 
should be preferred over a purposefully balanced assignment [Gosset (1938); 
Yates (1939); Greenberg (1951); Harville (1975); Arnold (1986); Kempthorne 
(1986)]. Our view is that with rerandomization, we can retain the advantages 
of randomization, while also ensuring balance. 

It is standard in randomized experiments today to collect covariate data 
and check for covariate balance, yet typically this is done after the experi- 
ment has started. If covariate data are available before the physical exper- 
iment has started, a randomization should be checked for balance before 
the physical experiment is conducted. If lack of balance is noted, as Gos- 
set stated, "it would be pedantic to continue with an arrangement of plots 
known beforehand to be likely to lead to a misleading conclusion" [Gosset 
(1938)]. It appears that Fisher would agree. In Rubin (2008a), Rubin re- 
counts the following conversation with his advisor Bill Cochran: 

Rubin: What if, in a randomized experiment, the chosen randomized allo- 
cation exhibited substantial imbalance on a prognostically important base- 
line covariate? 

Cochran: Why didn't you block on that variable? 

Rubin: Well, there were many baseline covariates, and the correct blocking 
wasn't obvious; and I was lazy at that time. 

Cochran: This is a question that I once asked Fisher, and his reply was 
unequivocal: 

Fisher (recreated via Cochran): Of course, if the experiment had not been 
started, I would rerandomize. 

A similar conversation between Fisher and Savage, wherein Fisher advo- 
cates rerandomization when faced with an undesirable randomization, is 
documented in Savage [(1962), page 88]. 

Checking covariates and rerandomizing when needed for balance has been 
advocated repeatedly. Sprott and Farewell (1993) recommend rerandomiza- 
tion when "obvious" lack of balance is observed. Rubin (2008a) suggests 
that if "important imbalances exist, rerandomize, and continue to do so 
until satisfied." For clinical trials, Worrall (2010) states that "if such base- 
line imbalances are found then the recommendation ... is to re-randomize 
in the hope that this time no baseline imbalances will occur." Cox (2009) 
and Bruhn and McKenzie (2009) have advocated rerandomization, suggest- 
ing either to do multiple randomizations and pick the "best," or to specify 
a bound for the difference in treatment and control covariate means for 
each covariate, following the "Big Stick" method of Soares and Wu (1985), 
and rerandomize until all differences are within these bounds. The latter 
rerandomization method was used in Maclure et al. (2006). 
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There are also many sources giving reasons not to rerandomize. Good 
accounts of the debate over rerandomization can be found in Urbach (1985) 
and Raynor (1986). The most common critique of rerandomization is that 
forms of analysis utilizing Gaussian distribution theory are no longer valid 
[Fisher (1926); Anscombe (1948a); Grundy and Healy (1950); Holschuh 
(1980); Bailey (1983); Urbach (1985); Bailey (1986); Bailey and Rowley 
(1987)]. Rerandomization changes the distribution of the test statistic, most 
notably by decreasing the true standard error, thus traditional methods of 
analysis that do not take this into account will result in overly "conservative" 
inferences in the sense that tests will reject true null hypotheses less often 
than the nominal level and confidence intervals will cover the true value more 
often than the nominal level. However, randomization-based inference is still 
valid [Anscombe (1948a); Kempthorne (1955); Brillinger, Jones and Tukey 
(1978); Tukey (1993); Rosenberger and Lachin (2002); Mouhon (2004)], be- 
cause the rerandomization can be accounted for during analysis. 

All other critiques of rerandomization, of which we are aware, deal with 
"ad-hoc" rerandomization, that is, rejecting randomizations without spec- 
ifying a rejection criterion in advance. We only advocate rerandomization 
if the decision to rerandomize or not is based on a pre-specified criterion. 
By specifying an objective rerandomization rule before randomizing, and 
then analyzing results using randomization-based methods, we can, in most 
circumstances, finesse all existing criticisms of rerandomizing. 

Some may think that rerandomization is unnecessary with large sample 
sizes, because as the sample size increases, the difference in covariate means 
between groups gets smaller, essentially proportional to the square root of 
the sample size. However, at the same rate, confidence intervals and signifi- 
cance tests are getting more sensitive to small differences in outcome means, 
which can be driven by small differences in covariate means. 

Despite the ongoing discussion about rerandomization and the fact that 
it is widely used in practice [Holschuh (1980); Urbach (1985); Bailey and 
Rowley (1987); Imai, King and Stuart (2008); Bruhn and McKenzie (2009)], 
little has been published on the mathematical implications of rerandomiza- 
tion. Remarkably, it appears that no source even makes explicit the condi- 
tions under which rerandomization is valid. Although a few rerandomization 
methods have been proposed [Moulton (2004); Maclure et al. (2006); Bruhn 
and McKenzie (2009); Cox (2009)], the implications have not been theoreti- 
cally explored, to the best of our knowledge. The only published theoretical 
results accompanying a rerandomization procedure appear to be those in 
Cox (1982), which proposed rerandomization to lower the sampling vari- 
ance of covariance-adjusted estimates. Here we aim to fill these lacuna by 
(a) making explicit the sufficient conditions under which rerandomization 
is valid, (b) describing in detail a principled procedure for implementing 
rerandomization and (c) providing corresponding theoretical results. 



4 



K. L. MORGAN AND D. B. RUBIN 



Collect covanatc data 

Specify criteria 
determining whether 
a randomization is 
acceptable 




(Re) random I 
units 



Acceptable 
covanate 
balance? 



Conduct 
experiment ^ 



r 

Yes 



Analyze (with a 
randomization test) j 



Fig. 1. Procedure for implementing rerandomization. 



2. Rerandomization in general. 

2.1. Procedure. The procedure for implementing rerandomization is de- 
picted in Figure 1, and has the following steps: 

(1) Collect covariate data. 

(2) Specify a balance criterion determining when a randomization is ac- 
ceptable. 

(3) Randomize the units to treatment groups. 

(4) Check the balance criterion; if the criterion is met, go to Step (5). 
Otherwise, return to Step (3). 

(5) Conduct the experiment using the final randomization obtained in 
Step (4). 

(6) Analyze the results using a randomization test, keeping only simu- 
lated randomizations that satisfy the balance criterion specified in Step (2). 

Let X be the n x k covariate matrix representing k covariates measured 
on n experimental units. Here we assume that a sample of units has already 
been selected and is fixed. Because we are not considering the sampling 
mechanism, we are only interested in the extent to which a causal effect 
estimate obtained in this randomized experiment is a good estimate of the 
true causal effect within the selected sample. The x matrix includes all the 
observed covariates for which balance between groups is desired, which may 
include original covariates, and any functions of original covariates, such 
as transformations, powers and interactions. Let W be the n-dimensional 
treatment assignment vector indicating the treatment group for each unit. 
The rerandomization criterion is based on a row-exchangeable scalar func- 
tion of X and W. 

fx W) — ^ ™ acceptable randomization, 

1^ 0, if W is not an acceptable randomization. 

The function ip can vary depending on the relative importance of balancing 
different covariates, on the level of covariate balance desired and on the 
computational power available, but it is specified in advance. 
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More generally, we can define a set of acceptance criteria, S = {fs}, from 
which we choose at each step, s, either deterministically or stochastically, 
where this choice can depend on the step, so that, for example, we can 
become more lenient as the steps increase without success. In this more 
general situation, (^^(x, W) denotes the acceptance criterion for step s. Al- 
though our theoretical results in Sections 2 and 3 hold for this more general 
setup, in practice we expect that the common choice will be one function 
for all steps, and to avoid notational clutter, we present results with one 
criterion. 

Once (p has been specified, units are randomized to treatment groups 
(Step 3). In the simplest form of rerandomization, this can be done with no 
restrictions; for example, randomly choose an assignment vector W from all 
possible vectors, or equivalently from all possible partitions of the units into 
groups. In practice, the initial randomization is typically done with some 
restriction to equalize treatment group sizes. 

Rerandomization is simply a tool that allows us to draw from some prede- 
termined set of acceptable randomizations, {W | (/9(x, W) = 1}. Rerandom- 
ization is analogous to rejection sampling; a way to draw from a set that 
may be tedious to enumerate. Specifying a set of acceptable randomizations 
and then choosing randomly from this set is recommended by Kempthorne 
(1955, 1986) and Tukey (1993), and Moulton (2004) notes that rerandom- 
ization may be required for implementation of this idea when the set of 
acceptable randomizations is difficult to enumerate a priori. 

Within this framework, rerandomization simply generalizes classical ex- 
perimental designs. For the basic completely randomized experiment with 
fixed sample sizes in each treatment group, (/7(x,W) = 1 when the num- 
ber of units assigned to each group matches the predetermined group sizes. 
For a randomized block experiment, (/^(x, W) = 1 when predetermined num- 
bers of units within each block are assigned to each treatment group. For 
a Latin square, (/?(x, W) = 1 when the randomization satisfies the Latin 
square design. These classical designs can be readily sampled from, so reran- 
domization is computationally inefficient, although equivalent, but for other 
functions, ip, rerandomization may be a more straightforward technique. 
Rerandomization can also be used together with any classical design. For 
example, in a medical experiment on hypertensive drugs, we may block on 
sex and a coarse categorization of baseline blood pressure, and use reran- 
domization to balance the remaining covariates, including fine baseline blood 
pressure. 

Researchers are free to chose any function y?, provided it is chosen in 
advance. Section 2.3 describes the conditions necessary to maintain general 
unbiasedness of simple point estimation. Section 3 recommends a particu- 
lar class of functions and studies theoretical properties of this choice and 
Section 4 discusses some reasons for choosing an affinely invariant (p. 
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2.2. Analysis by randomization tests. Under most forms of rerandomiza- 
tion, increasing balance in the covariates will typically create more precise 
estimated treatment effects, making traditional Gaussian distribution-based 
forms of analysis statistically too conservative. However, the final data can 
be analyzed using a randomization test, maintaining valid frequentist prop- 
erties. As Fisher stated, "It seems to have escaped recognition that the 
physical act of randomization . . . affords the means, in respect of any par- 
ticular body of data, of examining a wider hypothesis in which no normality 
of distribution is implied" [Fisher (1935)]. This physical act of randomiza- 
tion need not be pure randomization, but any randomization scheme that 
can be replicated when conducting the randomization test. 

We are interested in the effect of treatment assignment, W, on an out- 
come, y. Let yi{Wi) denote the ith unit's, {i = 1, . . . ,n}, potential outcome 
under treatment assignment Wi, following the Rubin causal model [Rubin 
(1974)]. Although rerandomization can be applied to any number of treat- 
ment conditions, to convey essential ideas most directly, we consider only 
two, and refer to these conditions as treatment and control. Let 

^ _ f 1, if treated, 
* 1 0, if control. 

Let Yq^^cw) denote the vector of observed outcome values: 

(1) Yobs,^ = yi{l)Wi + y^iO){l-Wi), 

where for notational simplicity the subscript obs means obs(W). Under the 
sharp null hypothesis of no treatment effect on any unit, yi(l) = yi{0) for 
every i, and thus the vector Y'obs is the same for every treatment assign- 
ment W. Consequently, leaving Yobs fixed and simulating many acceptable 
randomization assignments, W, we can empirically create the distribution 
of any estimator, g{x,W ,yobs), if the null hypothesis were true. To account 
for the rerandomization, each simulated randomization must also satisfy 
(/5(x, W) = 1. Once the desired number of randomizations has been simu- 
lated, the proportion of simulated randomizations with estimated treatment 
effect as extreme or more extreme than that observed in the experiment is 
the p-value. Although a full permutation test (including all the acceptable 
randomizations) is necessary for an exact p-value, the number of simulated 
randomizations can be increased to provide a p- value with any desired level 
of accuracy. This test can incorporate whatever rerandomization procedure 
was used, will preserve the significance level of the test [Moulton (2004)] and 
works for any estimator. Brillinger, Jones and Tukey (1978), Tukey (1993) 
and Rosenberger and Lachin [(2002), Chapter 7] suggest using randomiza- 
tion tests to assess significance when restricted randomization schemes are 
used. 

Because analysis by a randomization test requires generating many ac- 
ceptable randomizations, computational time can be important to consider 
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in advance. Define pa = P{(p = 1) to be the proportion of acceptable random- 
izations. Tlie choice of pa involves a trade-off between better balance and 
computational time; smaller values of pa ensure better balance, but they also 
imply a longer expected waiting time to obtain an acceptable randomiza- 
tion, at least without clever computational devices. The number of random- 
izations required to get one acceptable randomization follows a geometric 
distribution with parameter pa, so N simulated acceptable randomizations 
for a randomization test will require on average N/pa randomizations to be 
generated. 

The chosen pa must leave enough acceptable randomizations to perform 
a randomization test. In practice this is rarely an issue, because the number 
of possible randomizations is huge even for modest n. To illustrate, the 
number of possible randomizations for n = {30, 50, 100} randomizing to two 
equally sized treatment groups, (^"2)' order of {10^, 10^"*, 10^^}, 

respectively. However, for small sample sizes, care should be taken to ensure 
the number of acceptable randomizations does not become too small, for 
example, less than 1000. 

By employing the duality between confidence intervals and tests, for addi- 
tive treatment effects a confidence interval can be produced from a random- 
ization distribution as the set of all values for which the observed data would 
not reject such a null hypothesized value [Lehmann and Romano (2005); 
Manly (2007), Section 3.5, Section 1.4]. Garthwaite (1996) provides an effi- 
cient algorithm for generating a confidence interval for additive effects from 
a randomization test. The assumption of additivity is statistically conser- 
vative, at least asymptotically, as implied by Neyman's [Splawa-Neyman 
(1990)] results on standard errors being overestimated when assuming it 
relative to the actual standard errors. A randomization test can be applied 
to any sharp null hypothesis, that is, a hypothesis such that the observed 
data implies specific values for all missing potential outcomes. 

When the covariates being balanced are correlated with the outcome 
variable, then rerandomization increases precision (Section 3.2). A random- 
ization test reflects this increase in precision. Standard asymptotic-based 
frequentist analysis procedures that do not take the rerandomization into 
account will be statistically conservative. Not only will distribution-based 
standard errors not incorporate the increase in precision, but the act of 
rerandomizing itself will increase the estimated standard error beyond that 
of pure randomization. If the total variance in the outcome is fixed, de- 
creasing the actual sampling variance between treatment group means (by 
ensuring better balance), increases the variance within groups, and it is this 
variance within groups that is traditionally used to estimate the standard 
error [Fisher (1926)]. Thus, although rerandomization decreases the true 
standard error, it actually increases the standard error as estimated by tra- 
ditional methods. For both of these reasons, the regular estimated standard 
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errors will overestimate the true standard error, and using the correspond- 
ing distribution-based methods of analysis after rerandomization results in 
overly wide confidence intervals and less powerful tests of hypotheses. 

2.3. Maintaining an unbiased estimate. Although not needed to moti- 
vate rerandomization, we assume one goal is to estimate the average treat- 
ment effect 



T = y(l)-y(0) 

n n 

The fundamental problem in causal inference is that, because we only ob- 
serve UiiWi) for each unit, we cannot calculate (2) directly, and we must 
estimate r using only Yofeg. In this section, we assume the Stable Unit Treat- 
ment Value Assumption (SUTVA) [Rubin (1980)]: the potential outcomes 
are fixed and do not change with different possible assignment vectors W. 

The average treatment effect r is usually estimated by the difference in 
observed sample means, 

T = Yobs,T — Yobs,C, 

where 

W = ^^r-^ and W= " 

Theorem 2.1. Suppose Y^'^^-^ Wi = Er=i(l - ^0 and v?(x, W) = v?(x, 
1-W); i/ienE(f |x,(^ = l)=r. 

Proof. Under the specified conditions, W and 1 — W are exchange- 
able. Therefore, after rerandomization E(Tyj | x, 99 = 1) = E(l — Wi | x, 93 = 1) 
Vi, so E(Wi I x,99 = 1) =E(1 - Wi |x,(^ = 1) = 1/2 Vi. Hence 

E(f 1) =Ef - - .,^=1) 

\ n/2 n/2 / 



:E 



X,(/? = l 



n/2 n/2 

Er=i nw^ I if = l)y^{l) Er=i(i - nw^ i x, = i))yM 

n/2 n/2 

Er=i(i/2)y.(i) Er=i(i/2)y.(o) 

n/2 n/2 
r. □ 
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Theorem 2.1 holds for all outcome variables. Corollary 2.2 follows by the 
same logic. 

Corollary 2_^. I[_YJl=iWi = Ta=i{^ - Wi) and v3(x,W) = (^(x, 
1 — W), then K{Vt — Vc I x, 99 = 1) = for any observed or unobserved 
covariate V . 

If sample sizes are not fixed in advance, but each unit has E(Wi \ x) = 
1/2 in the initial randomization, f is only necessarily an unbiased estimate 
under the assumption of additivity. As a small example under nonadditivity, 
consider x = (0,1,2), y(l) = (1,1,0) and y(0) = (0,0,1). When ip{x,W) = 1 
if the difference in x means between the two groups is and ip = otherwise, 
the only two acceptable randomizations are W = (0, 1, 0) and W = (1, 0, 1). 
For either acceptable randomization, f = 1/2, yet r = 1/3. This artificial 
example also illustrates that if the treatment groups are of unequal size, f 
will not necessarily be an unbiased estimate after rerandomization. If the 
treatment group includes two units and the control group one unit, and 
(/9(x, W) is the same as before, then the only acceptable randomization is 
W = (1, 0, 1), and once again, f = 1/2, whereas r = 1/3. 

3. Rerandomization using Mahalanobis distance. To simplify the state- 
ment of theoretical results, we assume the sample sizes for the treatment 
and control groups are fixed in advance, with the fixed proportion of 
treated units, 

(3) 

n 

Let Ji-T — be the fc-dimensional vector of the difference in covariate 
means between the treatment and control groups, 

x'W x^(l-W) x^(W-p^l) 

(4j Xt-Xc = 7- r- = -. 

np^, n(l-p^) np^{l-p^) 

We consider Mahalanobis distance as a rerandomization criterion because 
it is an affinely invariant scalar measure of multivariate covariate balance. 
Mahalanobis distance is defined by 

(5) M = (Xt - Xc;)'[cov(Xt - Xc)]-'(Xt - Xc) 

(6) = npwil -pw)O^T - Xc)'cov(x)~^(Xt - Xc), 

where cov(x) represents the sample covariance matrix of x. The quantities n, 
Pn; and cov(x) are known and constant across randomizations. If cov(x) is 
singular, for example, if A; > n, then cov(x)~^ can be replaced with the 
pseudo-inverse of cov(x). For cluster randomized experiments, see Hansen 
and Bowers (2008). 
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Due to the finite population central limit theorem, Xj^ — Xc is asymptot- 
ically multivariate normally distributed over its randomization distribution 
[Erdos and Renyi (1959); Hajek (I960)]. Normality of X^^ — Xc is not neces- 
sary for rerandomization, but assuming normality allows for the theoretical 
results of this section. If X^^ — Xc is multivariate normal, then under pure 
randomization, M ~ Xfc [Mardia, Kent and Bibby (1980), page 62]; M is 
the statistic used in Hotelling's test, but note that here M follows a x| 
distribution because x is considered fixed. 

A randomization is deemed "acceptable" whenever M falls below a certain 
threshold, a. Let pa be the proportion of randomizations that are acceptable, 
so that P{M < a) = Pa- Either a or pa can be specified a priori, and then 
the other is fixed either using M ~ xl sample sizes are large enough 
or using an empirical distribution of M achieved through simulation. The 
rerandomization criterion, ipM, is 

/^,^ / -tiTN f 1, if M < a, 

3.1. Covariate balance under if M ■ 

Theorem 3.1. Assume rerandomization is conducted using ipM with 
Pui = 1/2, and the covariate means are multivariate normal; then 

(8) cov(Xt - Xc I x, = 1) = Va cov(Xt - Xc I x, ), 

where 

.q^ . =1^ 7(fe/2 + l,a/2) ^ P{xl+2<^) 

"'k j{k/2,a/2) P{xl<a) 

and 7 denotes the incomplete gamma function: "y{b,c) = f^y^~^e~'^dy. 
The proof of Theorem 3.1 is in the Appendix. 

In the field of matching, emphasis has been placed on "percent reduction 
in bias" [Cochran and Rubin (1973)]. In the context of randomized exper- 
iments there is no bias, and rerandomization instead reduces the sampling 
variance of the difference in covariate means, yielding differences that are 
more closely concentrated around 0. Define the percent reduction in vari- 
ance, the percentage by which rerandomization reduces the randomization 
variance of the difference in means, for each covariate, xj, by 

^^Q^ 100 ( ^^^(^i.^ ~ ^J.g I ^}_- ^^''{^i^T - Xj^c I X, 99 = 1) 

V vav{Xj^T - Xj^c I x) 

By Theorem 3.1, the percent reduction in variance for each covariate, and 
for any linear combination of these covariates, is 

(11) 100(1 -r;a) 
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Percent Reduction in Variance ioo(i va) 




10 20 30 40 50 

k: Number of Covariates 



Fig. 2. The percent reduction in variance for each covariate difference in means, as a 
function of the number of covariates and the proportion of randomizations accepted. 

and is shown as a function of k and pa in Figure 2, where by (9), < 
"Wa ^ 1- The lower the acceptance probabihty and the fewer covariates being 
balanced, the larger the percent reduction in variance. 

Notice that Theorem 3.1 holds for any covariate distribution, as long 
as the sample size is large enough for the central limit theorem to ensure 
normally distributed covariate means. An exact value is not needed, and an 
estimate is used only to guide the choice of pa', it has no influence on the 
validity of resulting inferences. 

3.2. Precision of the estimated treatment effect. Rerandomization im- 
proves precision, provided the outcome and covariates are correlated. Thus 
researchers can increase the power of tests and decrease the width of confi- 
dence intervals simply at the expense of computational time. 

Theorem 3.2. // (a) rerandomization is conducted using ipM withp^ = 
1/2, (b) the covariate and outcome means are normally distributed, and (c) 
the treatment effect is additive, then the percent reduction in variance of f is 

(12) 100(1 -Va)R^ 

where B? represents the squared multiple correlation between y and x within 
a treatment group and Va is as defined in (9). 

Proof. Regardless of the true relationship between the outcome and 
covariates, by additivity we can write 

(13) |x^ = /3o + /3'x, + rVFi + e^, 
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where /3o + /3'xj is the projection of yi onto the space spanned by (l,x), 
and Ci is a residual that encompasses any deviations from the hnear model. 
Then the estimated treatment effect, f, can be expressed as 

(14) f = /3'(XT-Xc) + T + (eT-ec). 

Because r is constant and the first and last terms are uncorrelated, we can 
express the variance of f as 

var(f) = var(/3'(XQ" — Xc)) + var(eT — ec) 

(15) _ _ 

= /3'cov(Xt — Xc)/3 + var(eT — ec)- 

By Theorem 3.1, rerandomization modifies the first term by the factor Va- 
Because under normality, orthogonality implies independence, the difference 
in residual means is independent of the difference in covariate means, and 
thus rerandomization has no affect on the second term. Therefore, the vari- 
ance of f after rerandomization restricting M < a is 

var(f \ X, M < a) = /3' cov(Xt — Xc | x, M < a)/3 + var(eT — ec | x, M < a) 

^^^^ = Va/3' cov(Xt - Xc I x)/3 + var(eT - ec | x). 

Let o"^ be the variance of the residuals and a'?, be the variance of the outcome 
within each treatment group, where a1 = ay{\ — B?) . Thus 

i- - I ^ ^e^ 4(1-^^) 

(17 var cT-ec X = - = ^L^^ -, 

and 

/?' cov(X7^ — Xc I x)/9 = var(f | x) — var(eT — ec | x) 



(18) 



Therefore by (16), (17) and (18), the variance of f after rerandomization is 
var(f I X, M < a) = Va(3' cov(Xq" — Xc | x)/3 + var(eT — ec | x) 



np^(l-p^) npyj{l-pu 

{l-{l-Va)R'^ 



npw{l -pw) 
= (l-(l-?;a)i22)var(f |x). 
Thus the percent reduction in variance is 100(1 — (1 — (1 — Va)R'^) = 100(1 
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Percent Reduction in Variance 
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Fig. 3. The percent reduction in variance for the estimated treatment effect, as a function 
of the acceptance probability, the number of covariates, and B? . 



The percent reduction in variance for the estimated treatment effect, 
shown as a function of k, pa and in Figure 3, is simply the percent 
reduction in variance for each covariate, scaled by R^. Because under the 
specified conditions f is unbiased by Theorem 2.1, 100(1 — Va)R? is not only 
the percent reduction in variance in the estimated treatment effect, but also 
the percent reduction in mean square error (MSE). 

If regression (i.e., analysis of covariance) is used to adjust for imbalance 
in a completely randomized experiment, the percent reduction in variance is 



(19) 



100 



n J 



M 
n 



[Cox (1982)], where M is as in (6). Comparing (19) to (10), we see that 
rerandomization can increase precision more than regression adjustment be- 
cause there is no estimation of regression coefficients with the former. Note 
that the highest percent reduction in variance achievable by either reran- 
domization or regression is lOOR^, achieved with perfect covariate mean 
balance. 



4. AfSnely invariant rerandomization criteria. In this section we explore 
the theoretical implications of choosing an affinely invariant rerandomiza- 
tion criterion, meaning that for any affine transformation of x, a + bx, 
(/9(x, W) = ip(a + bx, W). Measures based on inner products, such as Ma- 
halanobis distance or the estimated best linear discriminant, are affinely 
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invariant, as are criteria based on propensity scores estimated by linear lo- 
gistic regression [Rubin and Thomas (1992)]. Results in this section parallel 
those for affinely invariant matching methods [Rubin and Thomas (1992)]. 

In the previous sections, we regarded x as fixed, and only the random- 
ization vector, W was random. In this section, to use ellipsoidal symmetry 
of X, we regard both x and W as random, so expectations are over repeated 
draws of x and repeated randomizations. 

Theorem 4.1. If ip is affinely invariant, and if x is ellipsoidally sym- 
metric, then 



Proof. First, by ellipsoidal symmetry there is an affine transformation 
of X to a canonical form with mean (center) zero and covariance (inner prod- 
uct) I, the fc-dimensional identity matrix. The distribution of the matrix x 
in the treated group of size npw and the control group of size n(l — p^) are 
both independent and identically distributed samples from this zero cen- 
tered spherical distribution. Any affinely invariant rule for selecting subsets 
of treated and control units will be a function of affinely invariant statis- 
tics in the treatment and control groups that are also zero-centered spheri- 
cally symmetric. Applying if creates concentric zero-centered sphere(s) that 
partition the space of these statistics into regions where ip = 1 and ip = 0, 
and therefore the distribution of such statistics remains zero-centered and 
spherically symmetric. Transforming back to the original form completes the 
proof. □ 

Corollary 4.2. Ifipis affinely invariant and ifx is ellipsoidally sym- 
metric, then rerandomization leads to unbiased estimates of any linear func- 
tion of X. 

Note that, unlike Corollary 2.2, Corollary 4.2 applies no matter how the 
sample sizes are chosen. 

Corollary 4.3. If if is affinely invariant and ifx is ellipsoidally sym- 
metric, then 



One possible method of rerandomization, suggested by Moulton (2004), 
Maclure et al. (2006), Bruhn and McKenzie (2009) and Cox (2009), is to 
place bounds separately on each entry of — Xc and ensure that each 



(20) 
(21) 



E(Xt - Xc I = 1) = E(Xt - Xc) = and 
cov(Xr — Xc \ip = 1) (X cov(Xr — Xc). 



(22) 



cor(XT — Xc \ip = 1) = cor(Xj' — Xc). 



RERANDOMIZATION 



15 



covariate difference is within its specified caliper. However, this method is 
not affinely invariant and will generally destroy the correlational structure 
of Xt^ — Xc, even when x is ellipsoidally symmetric. 

Analogous to "Equal Percent Bias Reducing" (EPBR) matching meth- 
ods [Rubin (1976)], a rerandomization method is said to be "Equal Percent 
Variance Reducing" (EPVR) if the percent reduction in variance is the same 
for each covariate. 

Corollary 4.4. If (p is affinely invariant and ifx is ellipsoidally sym- 
metric, then rerandomization is EPVR for x and any linear function of x. 

Rerandomization methods that are not affinely invariant could increase 
the variance of some linear combinations of covariates [Rubin (1976)]. 

Although affinely invariant methods have desirable properties in general, 
they are not always preferred. For example, if covariates are known to vary 
in importance, a rerandomization method that is not EPVR may be more 
desirable, allowing greater percent reduction in variance for more important 
covariates. Rerandomization criteria that take into account covariates of 
varying importance are discussed in Lock [(2011), Chapter 4]. 

5. Discussion. 

5.1. Alternatives for balancing covariates. Rerandomization is certainly 
not the only way to balance covariates before the experiment. 

With only a few categorical covariates, simple blocking can successfully 
balance all covariates, and there is no need for rerandomization. With many 
covariates each taking on many values, however, blocking on all covari- 
ates can be impossible, and in this case we recommend blocking on the 
most important covariates, and rerandomizing to balance the components of 
the covariates orthogonal to the blocks. Blocking and rerandomization can, 
and we feel should, be used together. Multivariate matching [Greevy et al. 
(2004); Rubin (2006); Ho et al. (2007); Imai, King and Nail (2009); Xu and 
Kalbfleisch (2010)] is a special case of blocking that can better handle many 
covariates. 

Restricted (or constrained) randomization [Yates (1948); Grundy and 
Healy (1950); Youden (1972); Bailey (1983)] restricts the set of acceptable 
randomizations in a way that preserves the validity of asymptotic-based 
distributional methods of analysis. However, most work on restricted ran- 
domization is specific to agricultural plots, and apparently has not been 
extended to multiple covariates. Blocking, matching and restricted random- 
ization can all also be implemented through rerandomization by specifying 
the set of acceptable randomizations through ip. 

The Finite Selection Model (FSM) [Morris (1979); Morris and Hill (2000)] 
provides balance for multiple covariates, but provides a fixed amount of bal- 
ance in a fixed amount of computational time. Rerandomization has the 
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flexibility to choose the desired tradeoff between balance and computa- 
tional time. More details comparing FSM with rerandomization are in [Lock 
(2011), Section 5.5]. 

Covariate-adaptive randomization schemes [Efron (1971); White and Freed- 
man (1978); Pocock and Simon (1975); Pocock (1979); Simon (1979); Birkett 
(1985); Aickin (2001); Atkinson (2002); Scott et al. (2002); McEntegart (2003); 
Rosenberger and Sverdlov (2008)] are designed for clinical trials with se- 
quential treatment allocation over extended periods of time. Rerandomiza- 
tion as proposed here is not applicable to sequential allocation, and instead 
readers interested in such trials can refer to the above sources. 

If covariates are not balanced before the experiment, post-hoc methods 
such as regression adjustment are commonly used, which rely on assumptions 
that often cannot be verified [Tukey (1993); Freedman (2008)]. Moreover, 
unlike post-hoc methods, rerandomization is conducted entirely at the design 
stage, and so cannot be influenced by outcome data. Tukey (1993) and Rubin 
(2008b) give convincing reasons for why as much as possible should be done 
in the design phase of an experiment, before outcome data are available, 
rather than in the analysis stage when the researcher has the potential to 
bias the results, consciously or unconsciously. 

5.2. Extensions and additional considerations. For multiple treatment 
groups, any of the test statistics commonly used in multivariate analysis 
of variance (MANOVA) can be used to measure balance. The standard 
statistics are all equivalent to Mahalanobis distance in the special case of 
two groups. Extensions for multiple treatment groups are discussed in Lock 
[(2011), Section 5.2]. 

For unbiased estimates using rerandomization with treatment groups of 
unequal sizes, multiple treatment groups of equal size can be created, and 
then merged as needed after the rerandomization procedure, but before the 
physical experiment. If extra units are discarded to form equal sized treat- 
ment groups and rerandomization is employed, precision can actually in- 
crease if covariates are highly correlated with the outcome [Lock (2011), 
Section 5.3]. 

In a Bayesian analysis, as long as all covariates relevant to (/^(x, W) are 
conditioned on, the design is ignorable [Rubin (1978)], and theoretically, the 
analysis can proceed as usual. 

6. Conclusion. Randomization balances covariates across treatment 
groups, but only on average, and in any one experiment covariates may 
be unbalanced. Rerandomization provides a simple and intuitive way to im- 
prove covariate balance in randomized experiments. 

To perform rerandomization, a criterion determining whether a random- 
ization is acceptable needs to be specified. For unbiasedness, this rule needs 
to be symmetric regarding the treatment groups. If the criterion is affinely 
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invariant, then for enipsoidahy symmetric distributions, balance improve- 
ment will be the same for all covariates (and all linear combinations of the 
covariates), and correlations between covariate differences in means will be 
maintained. One such criterion is to rerandomize whenever Mahalanobis 
distance exceeds a certain threshold. 

When the covariates are correlated with the outcome, rerandomization in- 
creases precision. If the analysis reflects the rerandomization procedure, this 
leads to more precise estimates, more powerful tests and narrower confidence 
intervals. 

APPENDIX 

Proof of Theorem 3.1. Because Af ~ x| under pure randomization 
when the covariate means are normally distributed, rerandomization affects 
the mean of M as follows: 

{l/{T{k/2)2''/^))j;;y>^/^e-y/'dy 



E(M I X, M < a) 
(23) 



(l/(r(A:/2)2'^/2)) yfc/2-ie-y/2 dy 

r,{yl2fl^e~yl^dy 
(l/2)/;(y/2)fc/2-ie-?^/2dy 

2^ 7(A:/2 + l,a/2) 



7(fe/2,a/2) 

To prove (8), we convert the covariates to canonical form [Rubin and 
Thomas (1992)]. Let I] = cov(Xt - Xc | x), and define 

(24) Z = S-i/2(Xt-Xc), 

where S"^/^ is the Cholesky square root of Yr^ , so ^ 5.-1 gy 

the assumption of normality, 

Z|x~iVfc(0,I). 

Due to normality, uncorrelated implies independent and thus the elements 
of Z are independent and identically distributed (i.i.d.) standard normals. 
Therefore, the elements of Z are exchangeable. 

By (5), M = 7j"L = '}^j=i ^"j- Therefore for each j we have 



var(Zj I x, M < a) = E(Z; j x, M < a) 



(25) 



72 

E(M|x,M<a) 
k 

2 7(A;/2 + l,a/2) 

X 



k 7(A:/2,o/2) 
(26) =Va, 

where (25) follows from the exchangeability of the elements of Z. 
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After enforcing M <a, the elements of Z are no longer independent (they 
will be negatively correlated in magnitude), but with signs they remain 
uncorrelated due to symmetry: 

cov{Zi,Zj I X, M < a) = E,{ZiZj \ x, M < a) 

- E{Zi I X, M < a)E{Zj \ x, M < a) 

(27) = E{E{ZiZj I Zj, x, M <a)\x,M <a) - 

= E{ZjE{Zi I Zj, X, M < a) I x, M < a) 

(28) = E{Zj X I X, M < a) 

(29) = 0, 

where (27) follows from Corollary 2.2, and (28) follows because (Zj | Zj,M < 
a) ~ {-Zi I Zj,M<a), thus E{Zi \ Zj,M<a) = for aU 

Thus after rerandomization the covariance matrix of Z is Val, hence 

cov(Xt - Xc I X, M < a) = cov(5]^/2z | x, M < a) 

= po^(2 I X, M < a)S;i/2/ 

= Va COv(Xt - Xc I x). □ 
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